AITopics

2508.0091

Genre: Research Report > New Finding (0.45)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Westenfelder, Finnian, Hemberg, Erik, Tulla, Miguel, Moskal, Stephen, O'Reilly, Una-May, Chiricescu, Silviu

LLM-Supported Natural Language to Bash Translation

arXiv.org Artificial IntelligenceFeb-7-2025

The Bourne-Again Shell (Bash) command-line interface for Linux systems has complex syntax and requires extensive specialized knowledge. Using the natural language to Bash command (NL2SH) translation capabilities of large language models (LLMs) for command composition circumvents these issues. However, the NL2SH performance of LLMs is difficult to assess due to inaccurate test data and unreliable heuristics for determining the functional equivalence of Bash commands. We present a manually verified test dataset of 600 instruction-command pairs and a training dataset of 40,939 pairs, increasing the size of previous datasets by 441% and 135%, respectively. Further, we present a novel functional equivalence heuristic that combines command execution with LLM evaluation of command outputs. Our heuristic can determine the functional equivalence of two Bash commands with 95% confidence, a 16% increase over previous heuristics. Evaluation of popular LLMs using our test dataset and heuristic demonstrates that parsing, in-context learning, in-weight learning, and constrained decoding can improve NL2SH accuracy by up to 32%. Our findings emphasize the importance of dataset quality, execution-based evaluation and translation method for advancing NL2SH translation. Our code is available at https://github.com/westenfelder/NL2SH

large language model, machine learning, natural language, (20 more...)

2502.06858

Country:

Africa > Mali (0.04)
Asia > Singapore (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(8 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJun-10-2024

CodeR: Issue Resolving with Multi-Agent and Task Graphs

Chen, Dong, Lin, Shaoxin, Zeng, Muhan, Zan, Daoguang, Wang, Jian-Gang, Cheshkov, Anton, Sun, Jun, Yu, Hao, Dong, Guoliang, Aliev, Artem, Wang, Jie, Cheng, Xiao, Liang, Guangtai, Ma, Yuchi, Bian, Pan, Xie, Tao, Wang, Qianxiang

The rapidly growing capability of Large Language Models (LLMs) is dramatically reshaping many industries [2, 3, 4]. The most recent release of GPT-4o [5] demonstrates a significant leap in multi-modal capabilities and artificial intelligence (AI)-human interaction, whilst maintaining the same level of text generation, reasoning, and code intelligence as GPT-4-Turbo [6]. LLMs can interact with humans and the world as humans do, it is considered a starting point for LLMs to take over tasks from humans or collaborate naturally with humans. Issue resolving is one of the software engineering tasks experimented with LLMs that is particularly relevant in practice. SWE-bench [1] collects 2,294 real-world issues from 12 popular Python libraries.

agent, information, reproduce, (13 more...)

2406.01304

Country:

Asia > Singapore (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Vo, Ngoc Phuoc An, Paulovicks, Brent, Sheinin, Vadim

Tackling Execution-Based Evaluation for NL2Bash

arXiv.org Artificial IntelligenceMay-10-2024

Given recent advancement of Large Language Models (LLMs), the task of translating from natural language prompts to different programming languages (code generation) attracts immense attention for wide application in different domains. Specially code generation for Bash (NL2Bash) is widely used to generate Bash scripts for automating different tasks, such as performance monitoring, compilation, system administration, system diagnostics, etc. Besides code generation, validating synthetic code is critical before using them for any application. Different methods for code validation are proposed, both direct (execution evaluation) and indirect validations (i.e. exact/partial match, BLEU score). Among these, Execution-based Evaluation (EE) can validate the predicted code by comparing the execution output of model prediction and expected output in system. However, designing and implementing such an execution-based evaluation system for NL2Bash is not a trivial task. In this paper, we present a machinery for execution-based evaluation for NL2Bash. We create a set of 50 prompts to evaluate some popular LLMs for NL2Bash. We also analyze several advantages and challenges of EE such as syntactically different yet semantically equivalent Bash scripts generated by different LLMs, or syntactically correct but semantically incorrect Bash scripts, and how we capture and process them correctly.

bash command, directory, filesystem, (11 more...)

2405.06807

Country:

North America > United States > New York (0.05)
North America > United States > California > Los Angeles County > Los Angeles (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

arXiv.org Artificial IntelligenceJan-4-2024

Evaluating Language-Model Agents on Realistic Autonomous Tasks

Kinniment, Megan, Sato, Lucas Jun Koba, Du, Haoxing, Goodrich, Brian, Hasin, Max, Chan, Lawrence, Miles, Luke Harold, Lin, Tao R., Wijk, Hjalmar, Burget, Joel, Ho, Aaron, Barnes, Elizabeth, Christiano, Paul

In this report, we explore the ability of language model agents to acquire resources, create copies of themselves, and adapt to novel challenges they encounter in the wild. We refer to this cluster of capabilities as "autonomous replication and adaptation" or ARA. We believe that systems capable of ARA could have wide-reaching and hard-to-anticipate consequences, and that measuring and forecasting ARA may be useful for informing measures around security, monitoring, and alignment. Additionally, once a system is capable of ARA, placing bounds on a system's capabilities may become significantly more difficult. We construct four simple example agents that combine language models with tools that allow them to take actions in the world. We then evaluate these agents on 12 tasks relevant to ARA. We find that these language model agents can only complete the easiest tasks from this list, although they make some progress on the more challenging tasks. Unfortunately, these evaluations are not adequate to rule out the possibility that near-future agents will be capable of ARA. In particular, we do not think that these evaluations provide good assurance that the ``next generation'' of language models (e.g. 100x effective compute scaleup on existing models) will not yield agents capable of ARA, unless intermediate evaluations are performed during pretraining. Relatedly, we expect that fine-tuning of the existing models could produce substantially more competent agents, even if the fine-tuning is not directly targeted at ARA.

agent, information, language model, (14 more...)

2312.11671

Genre:

Workflow (0.67)
Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Fu, Quchen, Teng, Zhongwei, Georgaklis, Marco, White, Jules, Schmidt, Douglas C.

NL2CMD: An Updated Workflow for Natural Language to Bash Commands Translation

arXiv.org Artificial IntelligenceJun-18-2023

Translating natural language into Bash Commands is an emerging research field that has gained attention in recent years. Most efforts have focused on producing more accurate translation models. To the best of our knowledge, only two datasets are available, with one based on the other. Both datasets involve scraping through known data sources (through platforms like stack overflow, crowdsourcing, etc.) and hiring experts to validate and correct either the English text or Bash Commands. This paper provides two contributions to research on synthesizing Bash Commands from scratch. First, we describe a state-of-the-art translation model used to generate Bash Commands from the corresponding English text. Second, we introduce a new NL2CMD dataset that is automatically generated, involves minimal human intervention, and is over six times larger than prior datasets. Since the generation pipeline does not rely on existing Bash Commands, the distribution and types of commands can be custom adjusted. We evaluate the performance of ChatGPT on this task and discuss the potential of using it as a data generator. Our empirical results show how the scale and diversity of our dataset can offer unique opportunities for semantic parsing researchers.

large language model, machine learning, natural language, (16 more...)

doi: 10.13052/jmltapissn.2023.002

2302.07845

Country:

North America > United States > Virginia > Williamsburg (0.04)
North America > United States > Tennessee > Davidson County > Nashville (0.04)
North America > United States > California > Orange County > Irvine (0.04)
Europe > Czechia > Prague (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (1.00)
Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceDec-19-2022, 15:00:10 GMT

Train YOLO for Object Detection on a Custom Dataset using Python

I recently started working in the field of computer vision. And in these early days, I'm studying how the various algorithms of object detection work. Among the most well-known ones are R-CNN, Fast R-CNN, Faster R-CNN and of course YOLO. In this article, I want to focus on the last mentioned algorithm. YOLO is the state of the art in object detection and there are endless use cases where YOLO can be used.

dataset, makefile, yolo, (14 more...)

Country: Africa (0.05)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

#artificialintelligenceJun-15-2021, 00:53:52 GMT

11 Reasons To Learn Bash (A.K.A. Command Line)

But it's not just a skill for software devs -- learning bash can be valuable for anyone who works with data. In short, Bash is the Unix command-line interface (CLI). You'll also see it called the terminal, the command line, or the shell. It's a command language that allows us to work with files on our computers in a way that's far more efficient and powerful than using a GUI (graphical user interface). Making the switch from graphical user interfaces (GUIs) to a command-line interface can feel overwhelming.

command line, interface, keyboard shortcut, (12 more...)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence (1.00)

#artificialintelligenceMay-25-2021, 19:06:02 GMT

OpenAI-Powered Linux Shell

This is a basic Python shell (really, it's a fancy wrapper over the system shell) that takes a task and asks OpenAI for what Linux bash command to run based on your description. For safety reasons, you can look at the command and cancel before actually running it. To be clear, I'm not trying to convince you that having an AI model figure out what Linux command to run based on your written description is a good idea, but the commands that it generates are, well - watch the video if you want to see. There are several pre-canned ways of interacting with the models that OpenAI provides (the "GPT" models): completing a provided fragment, answering a question, generating "ideas" from a topic, summarizing a passage, etc. This shell uses the question-and-answer format and provides the model with an "example context" and examples of input and output.

bash command, openai api, openai-powered linux shell, (10 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.90)

#artificialintelligenceNov-19-2020, 08:15:36 GMT

UMAP clustering in Python

The aim of this short Python tutorial is to introduce the uniform manifold approximation and projection (UMAP) algorithm, using 76,533 single-cell expression profiles from the human primary motor cortex. The data are available from the Cell Types database, which is part of the Allen Brain Map platform. The UMAP has quickly established itself as a go-to clustering tool well poised to expand our knowledge of various many things, including the human brain. I hope by the end of this tutorial you will have a broad understanding of the UMAP algorithm and how to implement it. Uniform manifold approximation and projection (UMAP)1 is a scalable and efficient dimension reduction algorithm that performs competitively among state-of-the-art methods such as t-SNE2, and widely applied for unsupervised clustering.

expression profile, human primary motor cortex, umap, (12 more...)

Genre: Instructional Material > Course Syllabus & Notes (0.55)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)